housing_df = read.csv("housing.csv")[,-1]
## first column is Id so it can be removed

Section 1. Data Cleaning

1. Take a summary of the data and explore the result. How many categorical and numerical variables are there in the dataset?

summary(housing_df)
   MSSubClass      MSZoning          LotFrontage        LotArea          Street             Alley             LotShape         LandContour         Utilities          LotConfig        
 Min.   : 20.0   Length:1460        Min.   : 21.00   Min.   :  1300   Length:1460        Length:1460        Length:1460        Length:1460        Length:1460        Length:1460       
 1st Qu.: 20.0   Class :character   1st Qu.: 59.00   1st Qu.:  7554   Class :character   Class :character   Class :character   Class :character   Class :character   Class :character  
 Median : 50.0   Mode  :character   Median : 69.00   Median :  9478   Mode  :character   Mode  :character   Mode  :character   Mode  :character   Mode  :character   Mode  :character  
 Mean   : 56.9                      Mean   : 70.05   Mean   : 10517                                                                                                                    
 3rd Qu.: 70.0                      3rd Qu.: 80.00   3rd Qu.: 11602                                                                                                                    
 Max.   :190.0                      Max.   :313.00   Max.   :215245                                                                                                                    
                                    NA's   :259                                                                                                                                        
  LandSlope         Neighborhood        Condition1         Condition2          BldgType          HouseStyle         OverallQual      OverallCond      YearBuilt     YearRemodAdd 
 Length:1460        Length:1460        Length:1460        Length:1460        Length:1460        Length:1460        Min.   : 1.000   Min.   :1.000   Min.   :1872   Min.   :1950  
 Class :character   Class :character   Class :character   Class :character   Class :character   Class :character   1st Qu.: 5.000   1st Qu.:5.000   1st Qu.:1954   1st Qu.:1967  
 Mode  :character   Mode  :character   Mode  :character   Mode  :character   Mode  :character   Mode  :character   Median : 6.000   Median :5.000   Median :1973   Median :1994  
                                                                                                                   Mean   : 6.099   Mean   :5.575   Mean   :1971   Mean   :1985  
                                                                                                                   3rd Qu.: 7.000   3rd Qu.:6.000   3rd Qu.:2000   3rd Qu.:2004  
                                                                                                                   Max.   :10.000   Max.   :9.000   Max.   :2010   Max.   :2010  
                                                                                                                                                                                 
  RoofStyle           RoofMatl         Exterior1st        Exterior2nd         MasVnrType          MasVnrArea      ExterQual          ExterCond          Foundation          BsmtQual        
 Length:1460        Length:1460        Length:1460        Length:1460        Length:1460        Min.   :   0.0   Length:1460        Length:1460        Length:1460        Length:1460       
 Class :character   Class :character   Class :character   Class :character   Class :character   1st Qu.:   0.0   Class :character   Class :character   Class :character   Class :character  
 Mode  :character   Mode  :character   Mode  :character   Mode  :character   Mode  :character   Median :   0.0   Mode  :character   Mode  :character   Mode  :character   Mode  :character  
                                                                                                Mean   : 103.7                                                                              
                                                                                                3rd Qu.: 166.0                                                                              
                                                                                                Max.   :1600.0                                                                              
                                                                                                NA's   :8                                                                                   
   BsmtCond         BsmtExposure       BsmtFinType1         BsmtFinSF1     BsmtFinType2         BsmtFinSF2        BsmtUnfSF       TotalBsmtSF       Heating           HeatingQC        
 Length:1460        Length:1460        Length:1460        Min.   :   0.0   Length:1460        Min.   :   0.00   Min.   :   0.0   Min.   :   0.0   Length:1460        Length:1460       
 Class :character   Class :character   Class :character   1st Qu.:   0.0   Class :character   1st Qu.:   0.00   1st Qu.: 223.0   1st Qu.: 795.8   Class :character   Class :character  
 Mode  :character   Mode  :character   Mode  :character   Median : 383.5   Mode  :character   Median :   0.00   Median : 477.5   Median : 991.5   Mode  :character   Mode  :character  
                                                          Mean   : 443.6                      Mean   :  46.55   Mean   : 567.2   Mean   :1057.4                                        
                                                          3rd Qu.: 712.2                      3rd Qu.:   0.00   3rd Qu.: 808.0   3rd Qu.:1298.2                                        
                                                          Max.   :5644.0                      Max.   :1474.00   Max.   :2336.0   Max.   :6110.0                                        
                                                                                                                                                                                       
  CentralAir         Electrical          X1stFlrSF      X2ndFlrSF     LowQualFinSF       GrLivArea     BsmtFullBath     BsmtHalfBath        FullBath        HalfBath       BedroomAbvGr  
 Length:1460        Length:1460        Min.   : 334   Min.   :   0   Min.   :  0.000   Min.   : 334   Min.   :0.0000   Min.   :0.00000   Min.   :0.000   Min.   :0.0000   Min.   :0.000  
 Class :character   Class :character   1st Qu.: 882   1st Qu.:   0   1st Qu.:  0.000   1st Qu.:1130   1st Qu.:0.0000   1st Qu.:0.00000   1st Qu.:1.000   1st Qu.:0.0000   1st Qu.:2.000  
 Mode  :character   Mode  :character   Median :1087   Median :   0   Median :  0.000   Median :1464   Median :0.0000   Median :0.00000   Median :2.000   Median :0.0000   Median :3.000  
                                       Mean   :1163   Mean   : 347   Mean   :  5.845   Mean   :1515   Mean   :0.4253   Mean   :0.05753   Mean   :1.565   Mean   :0.3829   Mean   :2.866  
                                       3rd Qu.:1391   3rd Qu.: 728   3rd Qu.:  0.000   3rd Qu.:1777   3rd Qu.:1.0000   3rd Qu.:0.00000   3rd Qu.:2.000   3rd Qu.:1.0000   3rd Qu.:3.000  
                                       Max.   :4692   Max.   :2065   Max.   :572.000   Max.   :5642   Max.   :3.0000   Max.   :2.00000   Max.   :3.000   Max.   :2.0000   Max.   :8.000  
                                                                                                                                                                                         
  KitchenAbvGr   KitchenQual         TotRmsAbvGrd     Functional          Fireplaces    FireplaceQu         GarageType         GarageYrBlt   GarageFinish         GarageCars      GarageArea    
 Min.   :0.000   Length:1460        Min.   : 2.000   Length:1460        Min.   :0.000   Length:1460        Length:1460        Min.   :1900   Length:1460        Min.   :0.000   Min.   :   0.0  
 1st Qu.:1.000   Class :character   1st Qu.: 5.000   Class :character   1st Qu.:0.000   Class :character   Class :character   1st Qu.:1961   Class :character   1st Qu.:1.000   1st Qu.: 334.5  
 Median :1.000   Mode  :character   Median : 6.000   Mode  :character   Median :1.000   Mode  :character   Mode  :character   Median :1980   Mode  :character   Median :2.000   Median : 480.0  
 Mean   :1.047                      Mean   : 6.518                      Mean   :0.613                                         Mean   :1979                      Mean   :1.767   Mean   : 473.0  
 3rd Qu.:1.000                      3rd Qu.: 7.000                      3rd Qu.:1.000                                         3rd Qu.:2002                      3rd Qu.:2.000   3rd Qu.: 576.0  
 Max.   :3.000                      Max.   :14.000                      Max.   :3.000                                         Max.   :2010                      Max.   :4.000   Max.   :1418.0  
                                                                                                                              NA's   :81                                                        
  GarageQual         GarageCond         PavedDrive          WoodDeckSF      OpenPorchSF     EnclosedPorch      X3SsnPorch      ScreenPorch        PoolArea          PoolQC         
 Length:1460        Length:1460        Length:1460        Min.   :  0.00   Min.   :  0.00   Min.   :  0.00   Min.   :  0.00   Min.   :  0.00   Min.   :  0.000   Length:1460       
 Class :character   Class :character   Class :character   1st Qu.:  0.00   1st Qu.:  0.00   1st Qu.:  0.00   1st Qu.:  0.00   1st Qu.:  0.00   1st Qu.:  0.000   Class :character  
 Mode  :character   Mode  :character   Mode  :character   Median :  0.00   Median : 25.00   Median :  0.00   Median :  0.00   Median :  0.00   Median :  0.000   Mode  :character  
                                                          Mean   : 94.24   Mean   : 46.66   Mean   : 21.95   Mean   :  3.41   Mean   : 15.06   Mean   :  2.759                     
                                                          3rd Qu.:168.00   3rd Qu.: 68.00   3rd Qu.:  0.00   3rd Qu.:  0.00   3rd Qu.:  0.00   3rd Qu.:  0.000                     
                                                          Max.   :857.00   Max.   :547.00   Max.   :552.00   Max.   :508.00   Max.   :480.00   Max.   :738.000                     
                                                                                                                                                                                   
    Fence           MiscFeature           MiscVal             MoSold           YrSold       SaleType         SaleCondition        SalePrice     
 Length:1460        Length:1460        Min.   :    0.00   Min.   : 1.000   Min.   :2006   Length:1460        Length:1460        Min.   : 34900  
 Class :character   Class :character   1st Qu.:    0.00   1st Qu.: 5.000   1st Qu.:2007   Class :character   Class :character   1st Qu.:129975  
 Mode  :character   Mode  :character   Median :    0.00   Median : 6.000   Median :2008   Mode  :character   Mode  :character   Median :163000  
                                       Mean   :   43.49   Mean   : 6.322   Mean   :2008                                         Mean   :180921  
                                       3rd Qu.:    0.00   3rd Qu.: 8.000   3rd Qu.:2009                                         3rd Qu.:214000  
                                       Max.   :15500.00   Max.   :12.000   Max.   :2010                                         Max.   :755000  
                                                                                                                                                
length(housing_df[,sapply(housing_df, is.character) == TRUE]) ## Number of categorical variables
[1] 43
length(housing_df[,sapply(housing_df, is.numeric) == TRUE]) ## Number of numerical variables
[1] 37

2. (1pt) Which columns have missing values and what percentage of those columns have NAs? (Note. You can use colMeans(is.na(your data frame)) to find the percentage of NAs in each column).

missing_vals_columns = which(colSums(is.na(housing_df)) > 0)
sort(colSums(sapply(housing_df[missing_vals_columns], is.na)), decreasing = TRUE)
      PoolQC  MiscFeature        Alley        Fence  FireplaceQu  LotFrontage   GarageType  GarageYrBlt GarageFinish   GarageQual   GarageCond BsmtExposure BsmtFinType2     BsmtQual 
        1453         1406         1369         1179          690          259           81           81           81           81           81           38           38           37 
    BsmtCond BsmtFinType1   MasVnrType   MasVnrArea   Electrical 
          37           37            8            8            1 

The above columns have missing values

means = colMeans(sapply(housing_df[missing_vals_columns], is.na))
newmeans = as.numeric(sprintf("%2.3f", means)) * 100
names(newmeans) = names(means)
sort(newmeans, decreasing = TRUE)
      PoolQC  MiscFeature        Alley        Fence  FireplaceQu  LotFrontage   GarageType  GarageYrBlt GarageFinish   GarageQual   GarageCond BsmtExposure BsmtFinType2     BsmtQual 
        99.5         96.3         93.8         80.8         47.3         17.7          5.5          5.5          5.5          5.5          5.5          2.6          2.6          2.5 
    BsmtCond BsmtFinType1   MasVnrType   MasVnrArea   Electrical 
         2.5          2.5          0.5          0.5          0.1 

The above are the % NAs in each column with NAs

3. Is there any obvious outlier in the SalePrice? If so, remove them

## Before transformation
str(housing_df$SalePrice)
 int [1:1460] 208500 181500 223500 140000 250000 143000 307000 200000 129900 118000 ...
first_quant = quantile(housing_df$SalePrice)[2]
third_quant = quantile(housing_df$SalePrice)[4]
iqr = third_quant - first_quant
iqr_index = (housing_df$SalePrice > first_quant - 1.5*iqr & housing_df$SalePrice < third_quant + 1.5*iqr)
housing_df = housing_df[iqr_index, ]
## After transformation
str(housing_df$SalePrice)
 int [1:1399] 208500 181500 223500 140000 250000 143000 307000 200000 129900 118000 ...

4. (2pt)Read the data description carefully. For some of the variables, such as PoolQC, FirePlaceQU, Fence, etc. NA means not applicable rather than missing at random. For instance, a house that does not have a pool gets NA for PoolQC. For those variables for which NA means not applicable, you can replace NA with zero ( if that variable is numeric) or replace it with a new category/level, for instance, “notApplicable” if that variable is categorical.

# Cleaning categorical variables
cat_columns = c("Alley","BsmtQual","BsmtCond","BsmtExposure","BsmtFinType1","BsmtFinType2","FireplaceQu","GarageType","GarageFinish","GarageQual","GarageCond","PoolQC","Fence","MiscFeature")
new_columns = lapply(cat_columns, function(x){
  housing_df[is.na(housing_df[,x]),x] <<- "notApplicable"
})

# Cleaning numerical variables
num_columns = (sapply(housing_df, is.numeric) == TRUE) & (colSums(is.na(housing_df)) > 0)
new_columns = lapply(names(housing_df[,num_columns]), function(x){
  housing_df[is.na(housing_df[,x]),x] <<- 0
})

5. (1pt) After replacing not applicable NAs with appropriate values, find out which columns still have NAs and what percentage of each column is missing.

missing_vals_columns = which(colSums(is.na(housing_df)) > 0)
means = colMeans(sapply(housing_df[missing_vals_columns], is.na))
newmeans = as.numeric(sprintf("%2.3f", means)) * 100
names(newmeans) = names(means)
sort(newmeans, decreasing = TRUE)
MasVnrType Electrical 
       0.5        0.1 

6. (1pt) what percentage of rows in the dataset have one or more missing values? Use “complete.cases” function to answer this question.

missing_count = length(which(!complete.cases(housing_df)))
as.numeric(sprintf("%2.3f",missing_count/1399)) * 100 # 1399 is the number of observations in the data frame
[1] 0.6

0.6% of rows have missing values

Section 2. Data Exploration

8. (1pt) plot the histogram of SalePrice. Interpret the histogram. Is SalePrice variable skewed? To replace SalePrice with log(SalePrice. Compare the histogram of salesprice before and after log transformation.

hist(housing_df$SalePrice, xlab = "SalePrice", main = "Sale Price")

Yes, the SalePrice variable is right skewed

hist(log(housing_df$SalePrice), xlab = "Log SalePrice", main = "Sale Price")

The histogram of Log SalePrice is left skewed.

9. (2 pt) Use plot (SalePrice~. , data=housing) (replace housing with your dataframe after data cleaning) to draw scatter and side by side box plots of other variables against the Sale Price. From these plots, what variables seem to have correlation with SalePrice? (Note since we have so many variables, you do not need to use statistics tests, you can just answer this question based on your observations of the plots)

attach(housing_df)

The plots for categorical variables

colName = names(housing_df)
correlated_variables = vector(mode = "character")
temp = lapply(colName, function(col) {
  if (is.character(housing_df[,col])) {
      chisq_temp = chisq.test(SalePrice, as.factor(housing_df[,col]))
      if (chisq_temp$p.value < 0.006) {
        correlated_variables <<- append(correlated_variables, col)
      }
      plot(SalePrice ~ as.factor(housing_df[,col]), xlab = col)
  }
  })
Chi-squared approximation may be incorrect

Chi-squared approximation may be incorrect

Chi-squared approximation may be incorrect

Chi-squared approximation may be incorrect

Chi-squared approximation may be incorrect

Chi-squared approximation may be incorrect

Chi-squared approximation may be incorrect

Chi-squared approximation may be incorrect

Chi-squared approximation may be incorrect

Chi-squared approximation may be incorrect

Chi-squared approximation may be incorrect

Chi-squared approximation may be incorrect

Chi-squared approximation may be incorrect

Chi-squared approximation may be incorrect

Chi-squared approximation may be incorrect

Chi-squared approximation may be incorrect

Chi-squared approximation may be incorrect

Chi-squared approximation may be incorrect

Chi-squared approximation may be incorrect

Chi-squared approximation may be incorrect

Chi-squared approximation may be incorrect

Chi-squared approximation may be incorrect

Chi-squared approximation may be incorrect

Chi-squared approximation may be incorrect

Chi-squared approximation may be incorrect

Chi-squared approximation may be incorrect

Chi-squared approximation may be incorrect

Chi-squared approximation may be incorrect

Chi-squared approximation may be incorrect

Chi-squared approximation may be incorrect

Chi-squared approximation may be incorrect

Chi-squared approximation may be incorrect

Chi-squared approximation may be incorrect

Chi-squared approximation may be incorrect

Chi-squared approximation may be incorrect

Chi-squared approximation may be incorrect

Chi-squared approximation may be incorrect

Chi-squared approximation may be incorrect

Chi-squared approximation may be incorrect

Chi-squared approximation may be incorrect

Chi-squared approximation may be incorrect

Chi-squared approximation may be incorrect

non_scatter = c("MSSubClass","OverallQual","OverallCond","BsmtFullBath","BsmtFullBath","FullBath","HalfBath","BedroomAbvGr","KitchenAbvGr","TotRmsAbvGrd","Fireplaces","GarageCars","MoSold","YrSold")

temp1 = lapply(non_scatter, function(col) {
      chisq_temp = chisq.test(SalePrice, as.factor(housing_df[,col]))
      if (chisq_temp$p.value < 0.006) {
        correlated_variables <<- append(correlated_variables, col)
      }
      plot(SalePrice ~ as.factor(housing_df[,col]), xlab = col)
})
Chi-squared approximation may be incorrectChi-squared approximation may be incorrect

Chi-squared approximation may be incorrect

Chi-squared approximation may be incorrect

Chi-squared approximation may be incorrect

Chi-squared approximation may be incorrect

Chi-squared approximation may be incorrect

Chi-squared approximation may be incorrect

Chi-squared approximation may be incorrect

Chi-squared approximation may be incorrect

Chi-squared approximation may be incorrect

Chi-squared approximation may be incorrect

Chi-squared approximation may be incorrect

Chi-squared approximation may be incorrect

The scatter plots for numeric variables

col_name = names(housing_df)

scatter_cols = setdiff(col_name, non_scatter)

temp = lapply(scatter_cols, function(col) {
  if (is.numeric(housing_df[,col]) & col != "SalePrice") {
      cor_temp = cor.test(SalePrice, housing_df[,col])
      if (cor_temp$estimate >= 0.5 | cor_temp$estimate <= -0.5) {
        correlated_variables <<- append(correlated_variables, col)
      }
      plot(SalePrice ~ housing_df[,col], xlab = col)
  }
  return()
  })

NA

Below are the variables I’ve interpreted to be correlated with SalePrice

correlated_variables
 [1] "MSZoning"      "Street"        "LotShape"      "Neighborhood"  "Condition2"    "MasVnrType"    "ExterQual"     "ExterCond"     "Foundation"    "BsmtQual"      "BsmtCond"     
[12] "BsmtExposure"  "Heating"       "CentralAir"    "Electrical"    "KitchenQual"   "FireplaceQu"   "GarageFinish"  "SaleType"      "SaleCondition" "OverallQual"   "OverallCond"  
[23] "FullBath"      "HalfBath"      "TotRmsAbvGrd"  "GarageCars"    "YearBuilt"     "YearRemodAdd"  "TotalBsmtSF"   "X1stFlrSF"     "GrLivArea"     "GarageArea"   

10. (2 pt) Examine the columns with missing values to see if any of them are categorical. Use caret’s createDataPartition method to partition the dataset to 80% training and 20% testing. If a categorical column has missing values in train or test data, impute it with the mode of that column in the training data. It is important that the mode is computed based only on the training data only (instead of the entire dataset) to avoid data leakage.

These are the categorical columns with missing values

which(colSums(is.na(housing_df)) > 0 & sapply(housing_df, is.character))
MasVnrType Electrical 
        25         42 
library(caret)
# creating data partition
test_indexes = createDataPartition(y = housing_df$SalePrice, p = 0.8)
test_data = housing_df[test_indexes$Resample1, ]
train_data = housing_df[-test_indexes$Resample1, ]
# define getmode
getmode <- function(v) {
   uniqv <- unique(v)
   uniqv[which.max(tabulate(match(v, uniqv)))]
}
# Impute test data
masVnrType_na = is.na(test_data$MasVnrType)
electrical_na = is.na(test_data$Electrical)

test_data$MasVnrType[masVnrType_na] = getmode(test_data$MasVnrType)
test_data$Electrical[electrical_na] = getmode(test_data$Electrical)
LS0tCnRpdGxlOiAiSGFuZHMgb24gd2l0aCBSZWd1bGFyaXphdGlvbiBhbmQgVHJlZSBiYXNlZCBNb2RlbCBFbnNlbWJsZSIKb3V0cHV0OiBodG1sX25vdGVib29rCi0tLQoKCmBgYHtyfQpob3VzaW5nX2RmID0gcmVhZC5jc3YoImhvdXNpbmcuY3N2IilbLC0xXQojIyBmaXJzdCBjb2x1bW4gaXMgSWQgc28gaXQgY2FuIGJlIHJlbW92ZWQKYGBgCgojIyBTZWN0aW9uIDEuIERhdGEgQ2xlYW5pbmcKCl9fMS5fXyBUYWtlIGEgc3VtbWFyeSBvZiB0aGUgZGF0YSBhbmQgZXhwbG9yZSB0aGUgcmVzdWx0LiBIb3cgbWFueSBjYXRlZ29yaWNhbCBhbmQgbnVtZXJpY2FsIHZhcmlhYmxlcyBhcmUKdGhlcmUgaW4gdGhlIGRhdGFzZXQ/CgpgYGB7cn0Kc3VtbWFyeShob3VzaW5nX2RmKQpgYGAKYGBge3J9Cmxlbmd0aChob3VzaW5nX2RmWyxzYXBwbHkoaG91c2luZ19kZiwgaXMuY2hhcmFjdGVyKV0pICMjIE51bWJlciBvZiBjYXRlZ29yaWNhbCB2YXJpYWJsZXMKbGVuZ3RoKGhvdXNpbmdfZGZbLHNhcHBseShob3VzaW5nX2RmLCBpcy5udW1lcmljKV0pICMjIE51bWJlciBvZiBudW1lcmljYWwgdmFyaWFibGVzCmBgYAoKX18yLl9fICgxcHQpIFdoaWNoIGNvbHVtbnMgaGF2ZSBtaXNzaW5nIHZhbHVlcyBhbmQgd2hhdCBwZXJjZW50YWdlIG9mIHRob3NlIGNvbHVtbnMgaGF2ZSBOQXM/IChOb3RlLgpZb3UgY2FuIHVzZSBjb2xNZWFucyhpcy5uYSh5b3VyIGRhdGEgZnJhbWUpKSB0byBmaW5kIHRoZSBwZXJjZW50YWdlIG9mIE5BcyBpbiBlYWNoIGNvbHVtbikuCmBgYHtyfQptaXNzaW5nX3ZhbHNfY29sdW1ucyA9IHdoaWNoKGNvbFN1bXMoaXMubmEoaG91c2luZ19kZikpID4gMCkKc29ydChjb2xTdW1zKHNhcHBseShob3VzaW5nX2RmW21pc3NpbmdfdmFsc19jb2x1bW5zXSwgaXMubmEpKSwgZGVjcmVhc2luZyA9IFRSVUUpCmBgYApUaGUgYWJvdmUgY29sdW1ucyBoYXZlIG1pc3NpbmcgdmFsdWVzCmBgYHtyfQptZWFucyA9IGNvbE1lYW5zKHNhcHBseShob3VzaW5nX2RmW21pc3NpbmdfdmFsc19jb2x1bW5zXSwgaXMubmEpKQpuZXdtZWFucyA9IGFzLm51bWVyaWMoc3ByaW50ZigiJTIuM2YiLCBtZWFucykpICogMTAwCm5hbWVzKG5ld21lYW5zKSA9IG5hbWVzKG1lYW5zKQpzb3J0KG5ld21lYW5zLCBkZWNyZWFzaW5nID0gVFJVRSkKYGBgClRoZSBhYm92ZSBhcmUgdGhlICUgTkFzIGluIGVhY2ggY29sdW1uIHdpdGggTkFzCgpfXzMuX18gSXMgdGhlcmUgYW55IG9idmlvdXMgb3V0bGllciBpbiB0aGUgU2FsZVByaWNlPyBJZiBzbywgcmVtb3ZlIHRoZW0KYGBge3J9CiMjIEJlZm9yZSB0cmFuc2Zvcm1hdGlvbgpzdHIoaG91c2luZ19kZiRTYWxlUHJpY2UpCmZpcnN0X3F1YW50ID0gcXVhbnRpbGUoaG91c2luZ19kZiRTYWxlUHJpY2UpWzJdCnRoaXJkX3F1YW50ID0gcXVhbnRpbGUoaG91c2luZ19kZiRTYWxlUHJpY2UpWzRdCmlxciA9IHRoaXJkX3F1YW50IC0gZmlyc3RfcXVhbnQKaXFyX2luZGV4ID0gKGhvdXNpbmdfZGYkU2FsZVByaWNlID4gZmlyc3RfcXVhbnQgLSAxLjUqaXFyICYgaG91c2luZ19kZiRTYWxlUHJpY2UgPCB0aGlyZF9xdWFudCArIDEuNSppcXIpCmhvdXNpbmdfZGYgPSBob3VzaW5nX2RmW2lxcl9pbmRleCwgXQojIyBBZnRlciB0cmFuc2Zvcm1hdGlvbgpzdHIoaG91c2luZ19kZiRTYWxlUHJpY2UpCmBgYAoKX180Ll9fICgycHQpUmVhZCB0aGUgZGF0YSBkZXNjcmlwdGlvbiBjYXJlZnVsbHkuIEZvciBzb21lIG9mIHRoZSB2YXJpYWJsZXMsIHN1Y2ggYXMgUG9vbFFDLCBGaXJlUGxhY2VRVSwKRmVuY2UsIGV0Yy4gTkEgbWVhbnMgbm90IGFwcGxpY2FibGUgcmF0aGVyIHRoYW4gbWlzc2luZyBhdCByYW5kb20uIEZvciBpbnN0YW5jZSwgYSBob3VzZSB0aGF0IGRvZXMKbm90IGhhdmUgYSBwb29sIGdldHMgTkEgZm9yIFBvb2xRQy4gRm9yIHRob3NlIHZhcmlhYmxlcyBmb3Igd2hpY2ggTkEgbWVhbnMgbm90IGFwcGxpY2FibGUsIHlvdQpjYW4gcmVwbGFjZSBOQSB3aXRoIHplcm8gKCBpZiB0aGF0IHZhcmlhYmxlIGlzIG51bWVyaWMpIG9yIHJlcGxhY2UgaXQgd2l0aCBhIG5ldyBjYXRlZ29yeS9sZXZlbCwgZm9yCmluc3RhbmNlLCDigJxub3RBcHBsaWNhYmxl4oCdIGlmIHRoYXQgdmFyaWFibGUgaXMgY2F0ZWdvcmljYWwuCgpgYGB7cn0KIyBDbGVhbmluZyBjYXRlZ29yaWNhbCB2YXJpYWJsZXMKY2F0X2NvbHVtbnMgPSBjKCJBbGxleSIsIkJzbXRRdWFsIiwiQnNtdENvbmQiLCJCc210RXhwb3N1cmUiLCJCc210RmluVHlwZTEiLCJCc210RmluVHlwZTIiLCJGaXJlcGxhY2VRdSIsIkdhcmFnZVR5cGUiLCJHYXJhZ2VGaW5pc2giLCJHYXJhZ2VRdWFsIiwiR2FyYWdlQ29uZCIsIlBvb2xRQyIsIkZlbmNlIiwiTWlzY0ZlYXR1cmUiKQpuZXdfY29sdW1ucyA9IGxhcHBseShjYXRfY29sdW1ucywgZnVuY3Rpb24oeCl7CiAgaG91c2luZ19kZltpcy5uYShob3VzaW5nX2RmWyx4XSkseF0gPDwtICJub3RBcHBsaWNhYmxlIgp9KQoKIyBDbGVhbmluZyBudW1lcmljYWwgdmFyaWFibGVzCm51bV9jb2x1bW5zID0gKHNhcHBseShob3VzaW5nX2RmLCBpcy5udW1lcmljKSA9PSBUUlVFKSAmIChjb2xTdW1zKGlzLm5hKGhvdXNpbmdfZGYpKSA+IDApCm5ld19jb2x1bW5zID0gbGFwcGx5KG5hbWVzKGhvdXNpbmdfZGZbLG51bV9jb2x1bW5zXSksIGZ1bmN0aW9uKHgpewogIGhvdXNpbmdfZGZbaXMubmEoaG91c2luZ19kZlsseF0pLHhdIDw8LSAwCn0pCmBgYAoKX181Ll9fICgxcHQpIEFmdGVyIHJlcGxhY2luZyBub3QgYXBwbGljYWJsZSBOQXMgd2l0aCBhcHByb3ByaWF0ZSB2YWx1ZXMsIGZpbmQgb3V0IHdoaWNoIGNvbHVtbnMgc3RpbGwKaGF2ZSBOQXMgYW5kIHdoYXQgcGVyY2VudGFnZSBvZiBlYWNoIGNvbHVtbiBpcyBtaXNzaW5nLgoKYGBge3J9Cm1pc3NpbmdfdmFsc19jb2x1bW5zID0gd2hpY2goY29sU3Vtcyhpcy5uYShob3VzaW5nX2RmKSkgPiAwKQptZWFucyA9IGNvbE1lYW5zKHNhcHBseShob3VzaW5nX2RmW21pc3NpbmdfdmFsc19jb2x1bW5zXSwgaXMubmEpKQpuZXdtZWFucyA9IGFzLm51bWVyaWMoc3ByaW50ZigiJTIuM2YiLCBtZWFucykpICogMTAwCm5hbWVzKG5ld21lYW5zKSA9IG5hbWVzKG1lYW5zKQpzb3J0KG5ld21lYW5zLCBkZWNyZWFzaW5nID0gVFJVRSkKYGBgCgpfXzYuX18gKDFwdCkgd2hhdCBwZXJjZW50YWdlIG9mIHJvd3MgaW4gdGhlIGRhdGFzZXQgaGF2ZSBvbmUgb3IgbW9yZSBtaXNzaW5nIHZhbHVlcz8gVXNlCuKAnGNvbXBsZXRlLmNhc2Vz4oCdIGZ1bmN0aW9uIHRvIGFuc3dlciB0aGlzIHF1ZXN0aW9uLgoKYGBge3J9Cm1pc3NpbmdfY291bnQgPSBsZW5ndGgod2hpY2goIWNvbXBsZXRlLmNhc2VzKGhvdXNpbmdfZGYpKSkKYXMubnVtZXJpYyhzcHJpbnRmKCIlMi4zZiIsbWlzc2luZ19jb3VudC8xMzk5KSkgKiAxMDAgIyAxMzk5IGlzIHRoZSBudW1iZXIgb2Ygb2JzZXJ2YXRpb25zIGluIHRoZSBkYXRhIGZyYW1lCmBgYAowLjYlIG9mIHJvd3MgaGF2ZSBtaXNzaW5nIHZhbHVlcwoKIyMgU2VjdGlvbiAyLiBEYXRhIEV4cGxvcmF0aW9uCgpfXzguX18gKDFwdCkgcGxvdCB0aGUgaGlzdG9ncmFtIG9mIFNhbGVQcmljZS4gSW50ZXJwcmV0IHRoZSBoaXN0b2dyYW0uIElzIFNhbGVQcmljZSB2YXJpYWJsZSBza2V3ZWQ/ClRvIHJlcGxhY2UgU2FsZVByaWNlIHdpdGggbG9nKFNhbGVQcmljZS4gQ29tcGFyZSB0aGUgaGlzdG9ncmFtIG9mIHNhbGVzcHJpY2UgYmVmb3JlIGFuZCBhZnRlciBsb2cKdHJhbnNmb3JtYXRpb24uCgpgYGB7cn0KaGlzdChob3VzaW5nX2RmJFNhbGVQcmljZSwgeGxhYiA9ICJTYWxlUHJpY2UiLCBtYWluID0gIlNhbGUgUHJpY2UiKQpgYGAKWWVzLCB0aGUgU2FsZVByaWNlIHZhcmlhYmxlIGlzIHJpZ2h0IHNrZXdlZAoKYGBge3J9Cmhpc3QobG9nKGhvdXNpbmdfZGYkU2FsZVByaWNlKSwgeGxhYiA9ICJMb2cgU2FsZVByaWNlIiwgbWFpbiA9ICJTYWxlIFByaWNlIikKYGBgCgpUaGUgaGlzdG9ncmFtIG9mIExvZyBTYWxlUHJpY2UgaXMgbGVmdCBza2V3ZWQuCgpfXzkuX18gKDIgcHQpIFVzZSBwbG90IChTYWxlUHJpY2V+LiAsIGRhdGE9aG91c2luZykgKHJlcGxhY2UgaG91c2luZyB3aXRoIHlvdXIgZGF0YWZyYW1lIGFmdGVyIGRhdGEKY2xlYW5pbmcpIHRvIGRyYXcgc2NhdHRlciBhbmQgc2lkZSBieSBzaWRlIGJveCBwbG90cyBvZiBvdGhlciB2YXJpYWJsZXMgYWdhaW5zdCB0aGUgU2FsZSBQcmljZS4gRnJvbSB0aGVzZQpwbG90cywgd2hhdCB2YXJpYWJsZXMgc2VlbSB0byBoYXZlIGNvcnJlbGF0aW9uIHdpdGggU2FsZVByaWNlPyAoTm90ZSBzaW5jZSB3ZSBoYXZlIHNvIG1hbnkgdmFyaWFibGVzLAp5b3UgZG8gbm90IG5lZWQgdG8gdXNlIHN0YXRpc3RpY3MgdGVzdHMsIHlvdSBjYW4ganVzdCBhbnN3ZXIgdGhpcyBxdWVzdGlvbiBiYXNlZCBvbiB5b3VyIG9ic2VydmF0aW9ucyBvZgp0aGUgcGxvdHMpCgpgYGB7cn0KYXR0YWNoKGhvdXNpbmdfZGYpCmBgYAoKVGhlIHBsb3RzIGZvciBjYXRlZ29yaWNhbCB2YXJpYWJsZXMKYGBge3J9CmNvbE5hbWUgPSBuYW1lcyhob3VzaW5nX2RmKQpjb3JyZWxhdGVkX3ZhcmlhYmxlcyA9IHZlY3Rvcihtb2RlID0gImNoYXJhY3RlciIpCnRlbXAgPSBsYXBwbHkoY29sTmFtZSwgZnVuY3Rpb24oY29sKSB7CiAgaWYgKGlzLmNoYXJhY3Rlcihob3VzaW5nX2RmWyxjb2xdKSkgewogICAgICBjaGlzcV90ZW1wID0gY2hpc3EudGVzdChTYWxlUHJpY2UsIGFzLmZhY3Rvcihob3VzaW5nX2RmWyxjb2xdKSkKICAgICAgaWYgKGNoaXNxX3RlbXAkcC52YWx1ZSA8IDAuMDA2KSB7CiAgICAgICAgY29ycmVsYXRlZF92YXJpYWJsZXMgPDwtIGFwcGVuZChjb3JyZWxhdGVkX3ZhcmlhYmxlcywgY29sKQogICAgICB9CiAgICAgIHBsb3QoU2FsZVByaWNlIH4gYXMuZmFjdG9yKGhvdXNpbmdfZGZbLGNvbF0pLCB4bGFiID0gY29sKQogIH0KICB9KQoKbm9uX3NjYXR0ZXIgPSBjKCJNU1N1YkNsYXNzIiwiT3ZlcmFsbFF1YWwiLCJPdmVyYWxsQ29uZCIsIkJzbXRGdWxsQmF0aCIsIkJzbXRGdWxsQmF0aCIsIkZ1bGxCYXRoIiwiSGFsZkJhdGgiLCJCZWRyb29tQWJ2R3IiLCJLaXRjaGVuQWJ2R3IiLCJUb3RSbXNBYnZHcmQiLCJGaXJlcGxhY2VzIiwiR2FyYWdlQ2FycyIsIk1vU29sZCIsIllyU29sZCIpCgp0ZW1wMSA9IGxhcHBseShub25fc2NhdHRlciwgZnVuY3Rpb24oY29sKSB7CiAgICAgIGNoaXNxX3RlbXAgPSBjaGlzcS50ZXN0KFNhbGVQcmljZSwgYXMuZmFjdG9yKGhvdXNpbmdfZGZbLGNvbF0pKQogICAgICBpZiAoY2hpc3FfdGVtcCRwLnZhbHVlIDwgMC4wMDYpIHsKICAgICAgICBjb3JyZWxhdGVkX3ZhcmlhYmxlcyA8PC0gYXBwZW5kKGNvcnJlbGF0ZWRfdmFyaWFibGVzLCBjb2wpCiAgICAgIH0KICAgICAgcGxvdChTYWxlUHJpY2UgfiBhcy5mYWN0b3IoaG91c2luZ19kZlssY29sXSksIHhsYWIgPSBjb2wpCn0pCmBgYApUaGUgc2NhdHRlciBwbG90cyBmb3IgbnVtZXJpYyB2YXJpYWJsZXMKYGBge3J9CmNvbF9uYW1lID0gbmFtZXMoaG91c2luZ19kZikKCnNjYXR0ZXJfY29scyA9IHNldGRpZmYoY29sX25hbWUsIG5vbl9zY2F0dGVyKQoKdGVtcCA9IGxhcHBseShzY2F0dGVyX2NvbHMsIGZ1bmN0aW9uKGNvbCkgewogIGlmIChpcy5udW1lcmljKGhvdXNpbmdfZGZbLGNvbF0pICYgY29sICE9ICJTYWxlUHJpY2UiKSB7CiAgICAgIGNvcl90ZW1wID0gY29yLnRlc3QoU2FsZVByaWNlLCBob3VzaW5nX2RmWyxjb2xdKQogICAgICBpZiAoY29yX3RlbXAkZXN0aW1hdGUgPj0gMC41IHwgY29yX3RlbXAkZXN0aW1hdGUgPD0gLTAuNSkgewogICAgICAgIGNvcnJlbGF0ZWRfdmFyaWFibGVzIDw8LSBhcHBlbmQoY29ycmVsYXRlZF92YXJpYWJsZXMsIGNvbCkKICAgICAgfQogICAgICBwbG90KFNhbGVQcmljZSB+IGhvdXNpbmdfZGZbLGNvbF0sIHhsYWIgPSBjb2wpCiAgfQogIHJldHVybigpCiAgfSkKCmBgYApCZWxvdyBhcmUgdGhlIHZhcmlhYmxlcyBJJ3ZlIGludGVycHJldGVkIHRvIGJlIGNvcnJlbGF0ZWQgd2l0aCBTYWxlUHJpY2UKYGBge3J9CmNvcnJlbGF0ZWRfdmFyaWFibGVzCmBgYAoKX18xMC5fXyAoMiBwdCkgRXhhbWluZSB0aGUgY29sdW1ucyB3aXRoIG1pc3NpbmcgdmFsdWVzIHRvIHNlZSBpZiBhbnkgb2YgdGhlbSBhcmUgY2F0ZWdvcmljYWwuClVzZSBjYXJldOKAmXMgY3JlYXRlRGF0YVBhcnRpdGlvbiBtZXRob2QgdG8gcGFydGl0aW9uIHRoZSBkYXRhc2V0IHRvIDgwJSB0cmFpbmluZyBhbmQgMjAlIHRlc3RpbmcuIElmIGEgY2F0ZWdvcmljYWwgY29sdW1uIGhhcyBtaXNzaW5nIHZhbHVlcyBpbiB0cmFpbiBvciB0ZXN0IGRhdGEsIGltcHV0ZSBpdCB3aXRoIHRoZSBtb2RlIG9mIHRoYXQgY29sdW1uIGluIHRoZSB0cmFpbmluZyBkYXRhLiBJdCBpcyBpbXBvcnRhbnQgdGhhdCB0aGUgbW9kZSBpcyBjb21wdXRlZCBiYXNlZCBvbmx5IG9uIHRoZSB0cmFpbmluZyBkYXRhIG9ubHkgKGluc3RlYWQgb2YgdGhlIGVudGlyZSBkYXRhc2V0KSB0byBhdm9pZCBkYXRhIGxlYWthZ2UuCgpUaGVzZSBhcmUgdGhlIGNhdGVnb3JpY2FsIGNvbHVtbnMgd2l0aCBtaXNzaW5nIHZhbHVlcwpgYGB7cn0Kd2hpY2goY29sU3Vtcyhpcy5uYShob3VzaW5nX2RmKSkgPiAwICYgc2FwcGx5KGhvdXNpbmdfZGYsIGlzLmNoYXJhY3RlcikpCmBgYApgYGB7cn0KbGlicmFyeShjYXJldCkKYGBgCgoKYGBge3J9CiMgY3JlYXRpbmcgZGF0YSBwYXJ0aXRpb24KdGVzdF9pbmRleGVzID0gY3JlYXRlRGF0YVBhcnRpdGlvbih5ID0gaG91c2luZ19kZiRTYWxlUHJpY2UsIHAgPSAwLjgpCnRlc3RfZGF0YSA9IGhvdXNpbmdfZGZbdGVzdF9pbmRleGVzJFJlc2FtcGxlMSwgXQp0cmFpbl9kYXRhID0gaG91c2luZ19kZlstdGVzdF9pbmRleGVzJFJlc2FtcGxlMSwgXQpgYGAKCmBgYHtyfQojIGRlZmluZSBnZXRtb2RlCmdldG1vZGUgPC0gZnVuY3Rpb24odikgewogICB1bmlxdiA8LSB1bmlxdWUodikKICAgdW5pcXZbd2hpY2gubWF4KHRhYnVsYXRlKG1hdGNoKHYsIHVuaXF2KSkpXQp9CmBgYAoKYGBge3J9CiMgSW1wdXRlIHRlc3QgZGF0YQptYXNWbnJUeXBlX25hID0gaXMubmEodGVzdF9kYXRhJE1hc1ZuclR5cGUpCmVsZWN0cmljYWxfbmEgPSBpcy5uYSh0ZXN0X2RhdGEkRWxlY3RyaWNhbCkKCnRlc3RfZGF0YSRNYXNWbnJUeXBlW21hc1ZuclR5cGVfbmFdID0gZ2V0bW9kZSh0ZXN0X2RhdGEkTWFzVm5yVHlwZSkKdGVzdF9kYXRhJEVsZWN0cmljYWxbZWxlY3RyaWNhbF9uYV0gPSBnZXRtb2RlKHRlc3RfZGF0YSRFbGVjdHJpY2FsKQpgYGAKCg==